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ABSTRACT 

A key science goal of upcoming dark energy surveys is to seek time evolution of the dark 
energy. This problem is one of model selection, where the aim is to differentiate between cos¬ 
mological models with different numbers of parameters. However, the power of these surveys 
is traditionally assessed by estimating their ability to constrain parameters, which is a differ¬ 
ent statistical problem. In this paper we use Bayesian model selection techniques, specifically 
forecasting of the Bayes factors, to compare the abilities of different proposed surveys in dis¬ 
covering dark energy evolution. We consider six experiments — supernova luminosity mea¬ 
surements by the Supernova Legacy Survey, SNAP, JEDI, and ALPACA, and baryon acoustic 
oscillation measurements by WFMOS and JEDI — and use Bayes factor plots to compare 
their statistical constraining power. The concept of Bayes factor forecasting has much broader 
applicability than dark energy surveys. 
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1 INTRODUCTION 

Uncovering the nature of dark energy in the Universe is perhaps 
the greatest challenge facing cosmologists in coming years. In re¬ 
cent months many proposed experiments to probe dark energy 
have been defined, especially in response to a call for white pa¬ 
pers by the Dark Energy Task Force set up jointly in the US 
by the NSF, NASA and DOE. These propose a variety of tech¬ 
niques to constrain dark energy parameters, including the luminos¬ 
ity distance-redshift relation of Type la supemovae (SNe-Ia), the 
angular-diameter distance-redshift and expansion rate-redshift re¬ 
lations measured by baryon acoustic oscillations, and use of weak 
gravitational lensing to probe the growth rate of structures. 

Following on from heritage of CMB anisotropy studies, the 
standard tool used to illustrate the power of a given instrument 
or survey is a plot of the projected parameter errors around one 
or more fiducial models, estimated using a Fisher information ma¬ 
trix approach or likelihood analysis of Monte Carlo simulated data 
(Knox 1995; Jungman et al. 1996; Zaldarriaga, Spergel & Sel- 
jak 1997; Bond, Efstathiou & Tegmark 1997; Efstathiou & Bond 
1999). Typically, a projection of the parameter uncertainties onto a 
two-parameter equation-of-state model for dark energy is deployed, 
showing how tightly parameters are expected to be constrained 
around, for instance, the cosmological constant model. The impli¬ 
cation is intended to be that if the true values lie outside those error 
ellipses, then the survey will be able to exclude the cosmological 
constant model. 

However, the principal goal of such surveys is usually iden¬ 


tified as being the discovery of dark energy evolution. This is not 
a parameter estimation question, but rather one of model selection 
(Jeffreys 1961; MacKay 2003; Gregory 2005), where one seeks to 
compare cosmological models with different numbers of variable 
parameters. Within the framework of Bayesian inference, the sta¬ 
tistical machinery to make such comparisons exists, and is based 
around statistics known as the Bayesian evidence and the Bayes 
factor. The Bayes factor has the literal interpretation of measuring 
the change in relative probabilities of two models in light of obser¬ 
vational data, updating the prior relative model probabilities to the 
posterior relative model probabilities. 

In this paper we use Bayesian model selection tools to assess 
the power of different proposed experiments. Our method is re¬ 
lated to the Expected Posterior Odds (ExPO) forecasting recently 
developed by Trotta (2005). The main difference is that he takes 
the present observational constraints on the extended model, and 
seeks to estimate the fraction of that parameter space within which 
that model can be distinguished from a simpler embedded model. 
By contrast, we take a theoretically-motivated view of the param¬ 
eter space of interest, and seek the locations within that parameter 
space corresponding to dark energy models which are distinguish¬ 
able from a cosmological constant by a given experiment. We also 
differ computationally, in that as well as using approximate tech¬ 
niques, we use the nested sampling algorithm of Skilling (2004), as 
implemented by Mukherjee, Parkinson & Fiddle (2006), to com¬ 
pute the evidences accurately numerically. 

The paper is organized as follows. In Section|^we introduce 
model selection in the Bayesian framework. Section 0 describes 
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the dark energy surveys we make model selection forecasts for, 
and Section 0 presents the results. We conclude in Section|3 We 
consider some additional technical details and review the standard 
parameter forecast procedure in two Appendices. 


2 BAYESIAN MODEL SELECTION 
2.1 The model selection framework 

The Bayesian model selection framework has now been described 
in a variety of places (Jaffe 1996; MacKay 2003; Marshall, Hobson 
& Slosar 2003; Saini, Weller & Bridle 2004; Gregory 2005; Trotta 
2005; Mukherjee et al. 2006) and we will keep our account brief. 

In this context, a model is a choice of parameters to be var¬ 
ied to fit the data, its predictions being reflected in the prior ranges 
for those parameters. A model selection statistic aims to set up a 
tension between model complexity and goodness of fit to the ob¬ 
served data, ultimately providing a ranked list of models based 
on their probabilities in light of data. Within Bayesian inference, 
the appropriate statistic is the Bayesian evidence E (also known 
as the marginal likelihood), which is the probability of the data 
given the model in question. It is given by integrating the likeli¬ 
hood P{D\9, M) over the set of parameters 9 of model M, in light 
of data D, i.e. 

E{M) = P{D\M)= d9P{D\9,M)P{9\M), (1) 


where the prior P(9\M) is normalized to unity. The evidence is 
thus the average likelihood of the model over its prior parame¬ 
ter space. Rather than focusing simply on the best-fit parameters 
(which will always tend to favour the most complex model avail¬ 
able), it additionally rewards models with good predictiveness. 

By Bayes Theorem the evidence updates the prior model prob¬ 
ability to the posterior model probability. The ratio of the evidences 
of two models, Mo and Mi, is known as the Bayes factor (Kass & 
Raftery 1995): 


Boi 


E{Mo) 
E{Mi) ■ 


( 2 ) 


Note that the prior model probabilities are to be chosen in the 
Bayesian approach, and different people may have different opin¬ 
ions as to those. Nevertheless, everyone will agree on whether the 
Bayes factor led to their original belief becoming more or less ten¬ 
able relative to another model in light of the data. In describing 
results from Bayes factors, it is common to presume that the prior 
model probabilities are equal, and we shall follow that practice; 
anyone who thinks otherwise can readily recalculate the posterior 
relative model probability. 

The Bayesian evidence provides a ranked list of the models 
in terms of their probabilities, obviating the need to specify an ar¬ 
bitrary significance level as in frequentist chi-squared tests. Nev¬ 
ertheless one still has to decide how big a difference will be re¬ 
garded as significant. A useful guide as to what constitutes a sig¬ 
nificant difference between models is given by the Jeffreys’ scale 
(Jeffreys 1961); labelling as Mo the model with the higher evi¬ 
dence, it rates In Bqi < 1 as ‘not worth more than a bare mention’, 
1 < InBoi < 2.5 as ‘substantial’, 2.5 < In Hoi < 5 ‘strong’ to 
‘very strong’ and 5 < In Hoi as ‘decisive’. Note that In Hoi = 5 
corresponds to odds of 1 in about 150, and In Hoi = 2.5 to odds of 
1 in 13. 


2.2 Eorecasts and the Bayes factor plot 

In order to forecast the power of an experiment for model selec¬ 
tion, we ask the following question: Given a well-motivated sim¬ 
pler model embedded in a larger parameter space, how far away 
does the true model have to lie in order that the experiment is able 
to exclude the simpler model? There are many such cases present 
in cosmology, for example ACDM in the space of evolving dark 
energy models, the question of whether we live in a spatially-flat 
universe, or whether the initial power spectrum of perturbations is 
exactly scale invariant, or exactly a power law, etc. Here we will 
use the dark energy as a worked example. The Bayesian evidence 
of models with dark energy has been computed from current ob¬ 
servational datasets by several authors (Saini et al. 2004; Bassett, 
Corasaniti & Kunz 2004; Mukherjee et al. 2006), all finding that 
the simple ACDM model is the preferred fit to present data. Our 
aim here is to forecast its outcome in light of future datasets, in 
order to assess the power of those surveys for model selection. 

Our procedure is as follows. We first select an experimental 
configuration. We then consider a set of ‘fiducial models’ charac¬ 
terized by parameter values 9, which we shall consider in turn to 
be the true model. For each choice of fiducial model in our dark 
energy space we generate a set of simulated data D with the prop¬ 
erties expected of that experiment. We then compute the evidences 
of the two models we seek to distinguish, here the ACDM model 
and the general dark energy model. For definiteness, we choose to 
assess a set of dark energy experiments by their ability to distin¬ 
guish a ACDM model from a two-parameter dark energy model 
with equation of state given by 

w{z) = Wo+ Wa{l —a), (3) 

where wo and Wa are constants and a is the scale factor. Although 
the latter is sometimes referred to as the Linder parametrization 
based on its use in Linder (2003), it appears to have been first in¬ 
troduced by Chevallier & Polarski (2001). 

Here 9 refers to all the parameters of the model, but we are 
principally interested in the dependence of the Bayes factor on the 
extra parameters characterizing the extended model, here wo and 
Wa- Our main plots therefore show the difference in log evidence 
between the ACDM model and the two-parameter evolving dark 
energy model, plotted in the wo-Wa plane. This is the Bayes fac¬ 
tor plot, which is presented in Section^for different dark energy 
surveys, with contours showing different levels at which the two 
models can be distinguished by data simulated for each experiment. 

In general the Bayes factor is a function of all the fiducial pa¬ 
rameters, not just the dark energy ones. For the dark energy appli¬ 
cation this dependence turns out to be unimportant, but for com¬ 
pleteness we discuss some issues relating to this in Appendix A. 

Use of the Bayes factor plots to quantify experimental capabil¬ 
ities is quite distinct, both philosophically and operationally, from 
the use of parameter error forecasts; for readers unfamiliar with the 
latter we provide a short review in Appendix B. We highlight the 
advantages of the Bayes factor approach as follows: 

(i) Most experiments, particularly dark energy experiments, are 
motivated principally by model selection questions, e.g. does the 
dark energy density evolve, and so should be quantified by their 
ability to answer such questions. 

(ii) In Bayes factor plots, the data are simulated at each point 
of the dark energy parameter space that is to be confronted with 
the simpler ACDM model, whereas parameter error forecasts are 
plotted around only selected fiducial models (often just one). In 
particular, in the latter case the data are usually simulated for a 
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model that people hope to exclude, rather than the true model which 
would allow that exclusion. 

(Hi) The Bayesian model selection procedure accords special 
status to the ACDM model as being a well-motivated lower¬ 
dimensional model, which in Bayesian terms is rewarded for its 
predictiveness in having a smaller prior volume. Parameter estima¬ 
tion analyses do not recognize a special status for such models, 
e.g. the same criterion would be used to exclude w = —0.948 as 
U) = — 1. Model selection criteria provide a more stringent condi¬ 
tion for acceptance of new cosmological parameters than parameter 
estimation analyses. Model selection analyses can also accrue pos¬ 
itive support for the simpler model, whereas parameter estimation 
methods can only conclude consistency of the simpler model. 

(iv) In parameter error studies, it is necessary that the simple 
model is embedded as a special case of the second model. While 
the models we discuss here are indeed of that type, the Bayes factor 
could also be used to compare non-nested models (e.g. two differ¬ 
ent types of isocurvature perturbation). 

(v) Although it is not essential to do so, most parameter esti¬ 
mation forecasts assume a Gaussian likelihood in parameter space, 
while the Bayes factor plot uses the full likelihood. 

Set against these advantages, the only disadvantages of the 
Bayes factor method are that it is computationally more demand¬ 
ing, and that its conceptual framework has yet to become as familiar 
as that of parameter estimation. 


2.3 Bayes Factor Evaluation 

We use the nested sampling algorithm (Skilling 2004; Mukherjee 
et al. 2006), which is fast enough to enable exact evaluations of 
the evidence for many fiducial parameter values. For comparison 
we also compute results with the Savage-Dickey method outlined 
in Trotta (2005), using a Fisher matrix approximation to the likeli¬ 
hood about the true model, given as equation <B3> in Appendix B. 
We discuss how the results from the two methods compare in one 
case, and present our main results using the more accurate nested 
sampling method. 


2.3.1 Nested Sampling Algorithm 


The Bayes factor can be found by calculating the evidences of the 
two models independently, and then taking their ratio. This method 
requires integration over the extra cosmological parameters, which 
does not feature in the Savage-Dickey method. Here we use our 
implementation of the nested sampling algorithm (as described in 
Mukherjee et al. 2006) to perform the integration. To quickly sum¬ 
marize, the algorithm (Skilling 2004) recasts the problem as a one¬ 
dimensional integral in terms of the remaining ‘prior mass’ X, 
where dX = P{9\M)d9. The integral becomes 


E = 


L{X)dX, 


(4) 


where L{X) is the likelihood P{D\9, M). The algorithm samples 
the prior a large number of times, assigning an equal prior mass to 
each sample. The samples are then ordered by likelihood, and the 
integration follows as the sum of the sequence. 


3 = ^ 


(5) 


where the lowest likelihood sample goes into the sum, and is dis¬ 
carded to be replaced by a new sample selected under the condition 
that it lies above the likelihood of the discarded sample. In this way 
the algorithm works its way in to the highest likelihood peak. 

We compute the evidences using 300 live points, averaging 
over six repetitions of the calculation. This requires approximately 
10"^ likelihood evaluations per evidence computation. 


2.3.2 Savage-Dickey Formula 


Bayes factors for two nested models can be computed using the 
Savage-Dickey density ratio (Dickey 1971; Verdinelli & Wasser- 
man 1995; see Trotta 2005 for an application to cosmological 
model selection). Assuming a Gaussian approximation to the like¬ 
lihood, the Savage-Dickey formula of an extended model Mi with 
two free model parameters (01,^2) and flat priors (A6i, A02), ver¬ 
sus a simpler model Mq with 9i = 9^ and 92 =9^, is 


^2) 


A9iA92 

27r\/defrF^ 


( 6 ) 


where E^^ is the marginalized 2x2 Fisher matrix evaluated at 9p,. 
Our conventions are defined in Appendix B, and we have used the 
hat sign for the extended model parameters to emphasize that the 
Bayes factors directly compare the fiducial models of the parameter 
estimation analysis to the simpler nested model. 

In our specific case. Mi consists of all dark energy models 
parametrized by different values of wo and Wa, while Mq is the 
cosmological constant model which is nested in Mi with wo = — 1 
and Wa = 0. We use equation to compute the Bayes factor 
as function of wq and Wa, to determine the range of dark energy 
models that a given experiment is able to distinguish from ACDM. 

From equation we can see that the Bayes factor depends 
on two multiplicative terms, namely an exponential factor and an 
overall amplitude. The former accounts for the distance in the pa¬ 
rameter space of the model Mi from Mq in units of the forecasted 
parameter uncertainty. The latter accounts for the fraction of the 
accessible prior volume of the extended model Mi in light of the 
data, and hence this factor penalizes the model Mi for having a 
large parameter space compared to model Mq. As shown in Trotta 
(2005), this factor can be interpreted as an estimate of the informa¬ 
tive content of the data. 


I = logio 


A^i A^2 
VdetF-i ’ 


(7) 


being the order of magnitude by which the prior volume of model 
Ml will be reduced by the arrival of the forecasted data. 


3 DARK ENERGY SURVEYS 
3.1 The surveys 

We have simulated observational data for two types of future dark 
energy experiments: luminosity distance probes made through the 
measurement of Type la supernovae, and angular-diameter distance 
measurements from baryonic acoustic oscillations (BAG). Some of 
the experiments considered have weak lensing parts too (SNAP, 
JEDI, ALPACA), but we do not derive dark energy constraints from 
simulated weak lensing measurements here.^ Note that all these 

^ Both SN-Ia and BAO are distance indicators, while weak lensing is sen¬ 
sitive to growth and dark energy perturbations. Complementarity of weak 
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experiments are presently undergoing optimization of their survey 
structure which may improve their science return. 

For the SNe-Ia, we compared four different surveys. The 
CFFIT SuperNovae Legacy Survey (SNLS) is already underway 
but we consider the full five-year survey, while the SuperNovae 
Acceleration Probe (SNAP) and the Joint Efficient Dark Energy 
Investigation (JEDI) satellite missions, plus the Advanced Liquid- 
mirror Probe for Astrophysics, Cosmology and Asteroids (AL¬ 
PACA) ground-based survey, are all proposed experiments. For all 
experiments we assumed the same spread in magnitude 5m = 0.18 
of the supernovae, representing the combined effect of measure¬ 
ment error and intrinsic dispersion in the light-curve corrected lu¬ 
minosity [the intrinsic dispersion alone was recently estimated as 
0.12 mag by SNLS (Astier et al. 2006)]. We used the number distri¬ 
bution of SNe-Ia with redshift for the different surveys as outlined 
in the literature; the total numbers used are 700, 2000, 13 000 and 
86 000 for SNLS (Pritchet et al. 2004; Astier et al. 2006), SNAP 
(Aldering et al. 2004), JEDI (Crotts et al. 2005) and ALPACA 
(Corasaniti et al. 2005) respectively. We also assumed all surveys 
would have support from an extra 300 nearby SNe-Ia observed by 
ground-based telescopes in the redshift range 0.03 < a < 0.08, 
which also had a slightly smaller spread in magnitude (5m = 0.15). 
We assumed no systematic errors in any of the magnitudes, only 
statistical errors (except for one comparison case shown later). 

Eor the baryonic acoustic oscillations we compared two differ¬ 
ent surveys, the ground-based Wide-Eield Fibre-fed Multi-Object 
Spectrograph (WFMOS) and the satellite mission JEDI (JEDI will 
perform both a SN-Ia survey and a BAO survey). The BAO sur¬ 
veys measure both angular-diameter distance Da{z) and the Hub¬ 
ble parameter H{z) in a series of redshift bins. We calculated the 
expected errors of the measurements in each bin using the Fisher 
matrix approach of Seo & Eisenstein (2003), marginalizing over 
the physical matter density ■ 

In order to obtain accurate results from experiments of these 
types, it is necessary that strong degeneracies with the matter den¬ 
sity are removed by bringing in constraints from other sources. We 
make the assumption that by the time these surveys are operative, 
data compilations including Planck satellite observations will have 
provided a measurement of Dm to an accuracy of ±0.01 (see for 
example Pogosian et al. 2005). We include such a measurement by 
adding an extra term to the likelihood centred around the fiducial 
density parameter value. In the absence of such external informa¬ 
tion, dark energy surveys would give a much poorer return. We will 
briefly explore the effect of varying this assumption in Section l431 
Similarly in the BAO case we assume a 1% measurement uncer¬ 
tainty on (see for example Tegmark et al. 2000). 


3.2 Priors 

The model priors are the parameter ranges over which the evidence 
integral is carried out. Ordinarily in model selection these are sup¬ 
posed to be the wide priors seen as appropriate when the model was 
first considered, and not those motivated by current data. If one al¬ 
lows the model priors to ‘follow the data’ into a small region of 
parameter space, then model selection calculations will always be 
inconclusive in the long term, as this requires each new experiment 
to exclude a model again on its own, rather than the cumulative 


lensing with SN-Ia/BAO will thus be very interesting in probing dark en¬ 
ergy more comprehensively. 


effect of all observations.^ The precise results for the Bayes fac¬ 
tor will have some dependence on the choice of priors (see below), 
though the effect of the choices on model comparison or on sur¬ 
vey comparison is diminished as the same priors are used for the 
common model parameters and the same priors are used for each 
survey being compared. 

Our choices are as follows. We only consider flat Universes, 
so that Da = 1 — Dm. For the model priors, we impose the prior 
ranges —2 < wo < —0.333 and —1.333 < Wa < 1.333 on the 
interesting parameters, and 0 < Dm < 1 and 0.5 < h < 0.9 on 
the other parameters (the Hubble parameter is needed only for the 
baryon oscillation probes). The fiducial values for Dm and h are 
taken to be 0.27 and 0.7 respectively. 

Note that for the phenomenological two-parameter evolving 
dark energy model, the model priors on wq and Wa that we have 
chosen to work with are somewhat arbitrary. However if the prior 
space were reduced for instance by a factor of 2, that would in¬ 
crease the \nE of the evolving dark energy model by at most 
In 2 ~ 0.69, and this would not significantly affect our contours 
or conclusions which are based on differences in In E of 2.5 and 5. 
We make a brief investigation of some prior dependences in Sec- 
tion l4.5l 

One should note that our conclusions also depend to some ex¬ 
tent on our chosen dark energy parametrization being able to de¬ 
scribe the true model. One could consider more general cases, such 
as the four-parameter models of Corasaniti & Copeland (2003) and 
Linder & Huterer (2005). For the purpose of assessing the power 
of an experimental proposal, it seems reasonable to presume that 
experiments capable of distinguishing two-parameter models are 
likely also to be better under other parametrizations. If the effect 
of dark energy were in fact a non-smooth variation in the equa¬ 
tion of state and a non-smooth variation of the expansion his¬ 
tory with redshift, then our results are too optimistic; the valid¬ 
ity of reparametrizing the observables, which are the expansion 
history in different redshift bins as measured by the surveys, into 
(wo,Wa) would need to be tested when the data arrive. Aspects of 
parametrization have been explored in Wang & Tegmark (2004) 
and Bassett et al. (2004). 


4 RESULTS 

4.1 Comparison of calculational techniques 

We begin by comparing our two methods of computing the Bayes 
factor, focussing on the SNAP mission supernova survey. The 
Bayes factor plots are shown in Figure Q In the left panel we plot 
isocontours of Bayes factors in the Wo-Wa plane inferred from the 
nested sampling method. The plot shows the generic structure ex¬ 
pected of Bayes factor plots. In the central region, the simulated 
data are for models very close to ACDM, so that model gives a good 
fit and is further rewarded for its predictiveness, giving a positive 
Bayes factor which would support ACDM over the dark energy 
model. At the zero contour (the innermost one plotted) the mod¬ 
els fare equally well, and then at greater distances the dark energy 

^ An alternative, equivalent, view more in the Bayesian spirit is that one 
can update the model prior ranges after new data, provided one also up¬ 
dates the model probabilities and keeps track of them as well. In practice, 
cosmological data analysis tends to re-apply a broad set of data to models 
with wide priors each time, which is consistent with the model selection 
philosophy. 
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Figure 1. The Bayes factor plot for the SNAP mission supernova survey. The left panel shows the calculation using nested samping, and the right plot using 
the Savage-Dickey formula with the Fisher information matrix. The contour levels are In Bqi equal to 0, —2.5 and —5. 


1 



-1 


-2 -1.5 -1 -0.5 



Figure 2. This plot shows a parameter error forecast for the SNAP SN-Ia ex¬ 
periment, taking ACDM as the true model. The contour levels indicate 68% 
and 95%. While this figure uses the full likelihood, in this case a Gaussian 
approximation using the Fisher matrix gives essentially identical results. 


model becomes favoured. If the true parameters lie outside those 
contours, SNAP will be able to exclude ACDM at the probability 
corresponding to the contour level. 

We see a strong degeneracy between the two parameters, 
meaning that supernova data are not good at constraining models 
in this particular parameter direction. This same degeneracy shows 
up in the usual Fisher matrix error projection method. Its precise 
direction depends on the redshift distribution of the supernovae. 

In the right panel we plot the projected Bayes factor contours 
derived from the Savage-Dickey formula for the same experimental 
characteristics and priors assumed in the previous case. We see that 
this method gives generally good agreement with the nested sam¬ 
pling computation, indicating that our calculations are robust. Some 
slight differences are apparent, but this is expected as our version 
of the Savage-Dickey method employs a Gaussian approximation 
for the likelihood which may become poor at large distances from 
ACDM, with the Fisher matrix method underestimating the covari¬ 
ance matrix. For parameter estimation this is not a major concern, 
since deviations from the Gaussian approximation occur in the tail 
of the likelihood distribution, and quoted errors usually refer to the 


68% confidence intervals. However model selection calculations 
rely on good modelling well into the tails of the distribution. 

Having verified that our methods give similar results, hence¬ 
forth we will show results from the nested sampling method, since 
although it is computationally more intensive it does not assume a 
Gaussian likelihood. 


4.2 Comparison with parameter error forecasting 

In this paper we are strongly advocating use of Bayes factor plots 
to quantify experimental capabilities, for the reasons enumerated 
in Section fT2\ It is useful to see explicitly what differences this 
gives as compared to the traditional parameter forecast approach, 
and so Figure|^shows a plot of likelihood contours, obtained from 
a Markov chain Monte Carlo analysis of data simulated for the 
SNAP supernova survey, using precisely the same assumptions as 
FigureQ and assuming that ACDM is the true model. 

We see they share the same general shape, and that the same 
principal parameter degeneracy is picked out. Obviously the two 
plots are conceptually very different and so caution is needed in 
comparing. We see that the Bayes factor contours are significantly 
wider, indicating that model selection sets a more stringent con¬ 
dition for dark energy evolution to be supported by the data. In¬ 
deed, the 95% Fisher parameter contour lies within the In Boi = 0 
contour where model selection gives the models equal probability, 
hence by using the Fisher matrix plot we could rule out ACDM 
with data that actually favours it. It is fairly generic for that to be 
the case, indicating that 95% parameter estimation ‘results’ tend 
not to be robust under more sophisticated statistical analyses. This 
is a manifestation of Findley’s ‘paradox’ as discussed by Trotta 
(2005) — that parameter values rejected under a frequentist test 
can nevertheless be favoured by Bayesian model selection. 


4.3 Comparison of dark energy surveys 

We now turn to a comparison of the six dark energy surveys de¬ 
scribed above. We stress once more that this comparison considers 
the statistical uncertainties alone, and several of these experiments 
are likely to be limited by systematics. The criteria that enable 
the systematics to be most effectively minimized are likely to be 
different from those giving experiments raw statistical power. Fur- 
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Figure 3. Bayes’ factor forecasts for some future dai'k energy surveys. Contours are shown for log(Boi) > 0. —2.5 and —5. An independent measurement 
of Qm to ±0.01 is assumed. These plots show statistical uncertainties only, and several of these experiments are likely to be dominated by systematics. 


ther we are working under the limitation of the particular wo-Wa 
parametrization; dark energy in reality could be different. 

Figure 13 shows the six surveys, the upper four being super¬ 
nova surveys and the lower two being the baryon acoustic oscilla¬ 
tion surveys. The innermost contoured region is where the evidence 
of the ACDM model is greater than that of the evolving dark energy 


model (InBoi > 0). The outer contours show InSoi = —2.5 and 
—5 so that the data provides strong evidence in favour of the evolv¬ 
ing dark energy model. As with parameter estimation contours, the 
smaller the contours the more powerful the experiment is. 

As expected, we see a range of constraining powers depend¬ 
ing on the scale of the experiments. We also see that they broadly 
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Table 1. Two experimental figures-of-merit: the ateas in the WQ-Wa plane 
where IniJoi exceeds —2.5, and the value of InSpi at wq = —1 and 
Wa = 0. The former measures the region of parameter space where the 
experiment would not be able to exclude the ACDM model, while the latter 
measures the strength with which the experiment would support ACDM 
were it the true model. The In Bqi are additive between surveys and for 
independent probes of dark energy within the same survey. 


Experiment 

Area 

lni3oi(-li 0) 

SNLS 

0.51 

3.7 

SNAP 

0.35 

4.5 

JEDI SN 

0.19 

5.0 

ALPACA 

0.08 

6.1 

WFMOS 

0.26 

4.8 

JEDIBAD 

0.04 

6.0 


share the same principal degeneracy direction, with slight rotations 
visible from the different probing of redshift bins. The massive size 
of the expected ALPACA dataset gives it the smallest contour area 
amongst SN-Ia experiments, with its more limited redshift range 
making its degeneracy more vertical. 

The baryon oscillation probes share almost the same principal 
degeneracy as the SNe-Ia; although they use the angular-diameter 
distance rather than the luminosity distance these two are related 
by the reciprocity relation and hence follow the same degeneracy 
shape if the uncertainties in each redshift bin follow the same shape. 
A probe which partly included the growth of structure, such as 
weak lensing, would be expected to have a somewhat different de¬ 
generacy; this has been shown using Fisher parameter contours for 
the SNAP lensing survey though the rotation is still smaller than 
one would like. 

Note that the logarithms of the Bayes factors are additive, so 
if more than one of these surveys happen, or if there are two inde¬ 
pendent parts to a survey, then their Bayes factor plots can be added 
together to give a net Bayes factor plot. 

In addition to plotting Bayes factor contours, one can further 
compress the information on how powerful an experiment is by 
computing the area within a particular contour level, to give a sin¬ 
gle ‘figure-of-merit’. TableQsummarizes these areas, expressed in 
coordinate units, for the six experiments, showing the area where 
In Boi exceeds —2.5. Note that this corresponds to the parameter 
area in which an experiment cannot strongly exclude ACDM, and 
hence small numbers are better. For a more extensive discussion 
of figures-of-merit for optimization of dark energy surveys, in a 
parameter estimation rather than model selection framework, see 
Bassett (2005) and Bassett, Parkinson & Nichol (2005). 






Figure 4. Bayes’ factor forecasts for SNAP assuming different external 
knowledge of Dm. Contours are again shown for log{f?oi) > 0, —2.5 and 
—5. A Gaussian external constraint on Dm is assumed, of width 0.03 (top 
panel) reflecting approximately the current level of uncertainty on it, and 
0.003 (lower panel) reflecting an optimistic outcome. 


energy. This can be seen as another figure of merit quantifying the 
power of experiments. Note again that the In Bqi are additive be¬ 
tween surveys and for independent probes of dark energy within 
the same survey. 

Note that the absolute value of this figure of merit is more 
sensitive to the prior ranges chosen for the dark energy parameters, 
which set the volume factor. However the relative comparison of 
surveys is again not affected by this. 


4.4 Support for ACDM 

We now consider the possibility of the experiments ruling out the 
dark energy model in favour of ACDM, rather than the opposite 
which we have focussed on thus far. Unlike parameter estimation 
methods, Bayesian model selection can offer positive support in 
favour of the simpler model. Because the simpler model is nested 
within the dark energy model, it can never fit the data better, but it 
can benefit from the volume effect of its smaller parameter space. 
All one needs to do is read off the Bayes factor for the case where 
the fiducial model is ACDM. Table Q shows InBoi at u)o = — 1 
and Wa = 0, i.e. when ACDM is the true model. 

We find that this value is above 2.5 for all surveys, and above 
5 for several of them. Thus many of the surveys are capable of ac¬ 
cumulating strong evidence supporting ACDM over evolving dark 


4.5 Variation of assumptions 

We end by examining the effect of varying some of the assumptions 
that went into the calculations, focussing on the SNAP supernova 
survey for definiteness. We do this in three ways, one by changing 
the presumed knowledge on Dm that complements the dark energy 
survey, one by looking at an alternative prior in the dark energy 
model space, and finally by altering the assumed dispersion of su¬ 
pernova luminosities and allowing for a simple model of systemat- 
ics. 

As mentioned before, the return on dark energy surveys is 
quite sensitive to the availability of external constraints to remove 
parameter degeneracies, particularly Dm in the case of the super¬ 
novae. Figure 0 shows this effect for the SNAP supernova survey, 
with different constraints on Dm to be compared with our standard 
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Figure 5. SNAP Bayes’ factor contours for the quintessence prior on uiq 
and Wa- The lower left-hand region is cut off by the prior. The dashed con¬ 
tour shows the location of the In Bqi = —5 contour for our full prior, as 
given in Figure^ 


Figure 6. The main contours match the left panel of Figure^ showing the 
SNAP supernova survey. Additionally, the dashed contour shows how the 
outer contour shifts under a simple modelling of systematics, and the dot- 
dashed contour shows how the outer contour would move if the magnitude 
error were smaller. 


assumption leading to the left panel of FigureQ One sees a signif¬ 
icant worsening of the Bayes factor contours in the case of weaker 
knowledge on Om- 

Altering the constraint on Om can have a different effect 
on different experiments. For instance, if it were more stringent, 
then the difference between SNLS and SNAP or JEDI would be 
greater — the requirement for a more sensitive experiment would 
be greater. Similarly the relative comparison is dependent on the 
nature of dark energy itself; if a parametrization more complex 
than Wo-Wa proved necessary, it would be more important to make 
precise high-redshift observations (e.g. SNAP or JEDI versus AL¬ 
PACA). 

Eigurel^modifies our assumptions in a different way, this time 
altering the prior on the dark energy parameters. It assumes a prior 
appropriate to quintessence models, namely that w ^ —1 at all 
redshifts. The evidence integral for the dark energy model is then 
carried out over a narrower region in the dark energy parameters, 
giving a boost to the evidence of the dark energy model relative to 
ACDM. However the effect is small; the dashed line shows where 
the outer contour lay with our full dark energy prior and it has 
shrunk in only marginally. 

Caldwell & Linder (2005) classified quintessence models into 
freezing and thawing models and delineated areas of the wo-Wa 
space where those models typically lie. According to Pigure0 
freezing models can only be decisively distinguished from ACDM 
by the SNAP supernova survey if wo > —0.9, and thawing models 
if Wo > —0.87. 

We have also investigated how changing the prior ranges on 
the dark energy parameters alters the areas given in Table Q In 
this case we narrowed the priors on wq and Wa by a factor of two 
in each direction. In cases where the posterior still lies within the 
priors, this shifts the evidence by In 4 ~ 1.4 in favour of the dark 
energy model. Unsurprisingly, we find this reduces the areas within 
which ACDM cannot be excluded, typically by 10 to 20 per cent. 
Importantly, however, this change preserves the rankings of the ex¬ 
periments. 

Einally, in Pigure|^we examine how the outer contour would 
shift if a smaller magnitude dispersion were achieved (we take 
(5m = 0.13), and separately under a standard (but crude) mod¬ 
elling of possible systematics (see for example Kim et al. 2004). 


The systematics have been modelled as an increased redshift de¬ 
pendent uncertainty in magnitude of {z/Zinax)Smsys per redshift 
bin with Sirisys = 0.02 mag, and added in quadrature to the (in¬ 
trinsic) statistical uncertainty. For SNAP, this type of systematic 
has quite a small effect. There can be other types of systematics 
in the data, but we do not try to model them here as the ability of 
different experiments to detect and (internally) resolve systematics 
would be different and a proper study of systematics and the re¬ 
quired marginalization over them can only be done once the data 
arrive. 


5 CONCLUSIONS 

In this paper we have introduced the Bayes factor plot as a tool 
for assessing the power of upcoming experiments. It offers a full 
implementation of Bayesian model selection as a forecasting tool. 
As compared to the traditional parameter error forecasting tech¬ 
nique, it offers a number of advantages enumerated in Section l2^ 
Amongst those, perhaps the most important are that observational 
data is simulated at each point in the plane, rather than at a small 
number of fiducial models, and that the Bayes factor plot properly 
captures the experimental motivation as being one of model selec¬ 
tion rather than parameter estimation. 

As a specific example, we have used the Bayes factor plots 
to examine a number of proposed dark energy surveys, concentrat¬ 
ing on their ability to distinguish between the ACDM model and a 
two-parameter dark energy model. Figure^indicates the region of 
parameter space outside which the true model has to lie, in order 
for the experiment to have sufficient statistical power to exclude 
ACDM using model selection statistics. 

An important caveat is that our plots do not show the effects 
of systematics, which are likely to be the dominant uncertainty for 
many of the experiments. This drawback is shared by parameter er¬ 
ror forecasts, and it is more or less the nature of systematic uncer¬ 
tainties that they cannot be usefully modelled in advance of actual 
observational data being obtained. In judging the true merit of an 
experimental proposal, it is therefore essential to judge how well 
structured it is for optimal removal of systematics, as well as look¬ 
ing at its raw statistical power. 
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While we have focussed on dark energy as a specific applica¬ 
tion, the concept of the Bayes factor plot has much broader appli¬ 
cability, and is suitable for deployment in a wide range of cosmo¬ 
logical contexts. 
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APPENDIX A: MARGINALIZING OVER SIMULATED 
DATA 

In the main body of the paper, we have plotted the Bayes factor 
as a function of the fiducial values of the dark energy parameters, 
assuming particular values for the other parameters in the fiducial 
model. In general, however, the Bayes factor is a function of all 
the fiducial model parameters, not just the ones of principal inter¬ 
est, though a practical problem is that we cannot easily plot the 
evidence ratio Boi as a function of more than two variables 0. 

One solution is marginalization over the parameters that we 
are not interested in, as one does in parameter estimation, assuming 
those parameters to lie within the range motivated by present data. 
As the fiducial parameters 6 belong to the definition of the data, we 
need to marginalize the evidences, P{D\M), rather than the Bayes 
factors. Formally, the marginalization must take place in data space, 
so when we wish to integrate out a ‘nuisance’ parameter which 
the data is a function of, we should take into account a transfor¬ 
mation factor ^/^ ],(dDild6^Y, evaluated at each 9 along the 
integral. However, provided the evidence varies only weakly over 
the relevant range of the fiducial models, or if our model depends 
(nearly) linearly on its parameters, then this function is a constant 
which cancels when computing the Bayes factor. In this case we 
can just average the evidences. This will also conserve the relation 
Boi = 1/Bio- 

In practice, the main determining factor in whether particular 
extra parameters are justified by the data is the true values of those 
parameters themselves, rather than values of the other parameters. 
Often, then, one can choose fixed values of the uninteresting pa¬ 
rameters, presenting results on a slice through the fiducial parame¬ 
ter space. Indeed, this turns out to be the case for the dark energy 
surveys in this paper. 


APPENDIX B: PARAMETER ERROR FORECASTING 


In this paper we are advocating the use of Bayes factors to quantify 
the power of upcoming experiments, in place of parameter error 
forecasts. For comparison, we provide a brief overview of param¬ 
eter error forecasting here, and discuss some of its features and 
limitations. 

The idea is to simulate a sample of experimental data and then 
infer the parameter uncertainties using standard likelihood analysis. 
More specifically, assuming a model M specified by a set of param¬ 
eters 9 = {9fj,},a sample of data D with the expected experimental 
errors is generated for a particular fiducial model with parameter 
values 9. Then a likelihood P{D\9, M) is computed and the con¬ 
fidence intervals on the 9 parameters are inferred by computing a 
posterior parameter probability distribution via Bayes’ rule. 


P{9\D,M) 


P{D\9,M)P{9\M) 

P{D\M) 


(Bl) 
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Figure Bl. The marginalized 68% and 95% confidence contours in Qm-w 
plane. The fiducial models are a ACDM with w = —1 and Qm = 0.3 
(solid line), a dark energy model with w = —0.85 and = 0.35 (dash- 
dot line), and a phantom model with w = 1.16 and Qm = 0.26 (dash 
line). 


Figure B2. Marginalized 68% and 95% contours in WQ-Wa plane. The 
fiducial models are ACDM (solid line), a dark energy model with wq = 
—0.8 and Wa = —1 (dash-dot line), and a phantom model with wq = 
— 1.4 and Wa = —0.2 (dash line). For all three models Qm = 0.3. 


As a result the parameter uncertainties depend both on the experi¬ 
mental characteristics and the choice of the fiducial model. 

A simplified way of carrying out such an analysis is to use the 
Fisher matrix approximation. By construction the fiducial model 
parameter values 6 maximize the likelihood. Hence expanding 
In P{D\0, M) to quadratic order in SO = 0 — 0 one obtains (Bond 
1995; Tegmark, Taylor & Heavens 1997) 

P(D|6', M) ~ exp I ^ I, (B2) 

\ IIV / 


which is a Gaussian approximation to the likelihood with zero mean 
and with variance given by the inverse of the Fisher matrix 
where 


p-i dDi dDj 

de^ de^ ■ 


(B3) 


The sum is over all measurements and the partial derivatives are 
evaluated at the fiducial model parameter values The matrix 
Cij is the data covariance matrix; for independent measurements 
(e.g. different supernovae) it simplifies to a^{Di)dij. The param¬ 
eter errors are then given by the square root of the diagonal com¬ 
ponents of the covariance matrix, o'(0^) = B is ev¬ 

ident from equation <B3t that more accurate data, characterized 
by smaller uncertainty (t(D), provide larger Fisher matrix com¬ 
ponents, hence smaller parameter errors. It can also be noticed that 
for a given experiment the parameters which are better constrained 
are those for which the partial derivatives are larger. 

Since these derivatives are computed at the fiducial model, it is 
natural to expect that the size of the projected errors varies for dif¬ 
ferent fiducial parameter values. These contours are usually plotted 
with the aim of drawing a conclusion based on the true model hav¬ 
ing different parameter values from those of the fiducial model. But 
the dependence on the choice of fiducial model means that there is 
no guarantee that the conclusions based on contours around e.g. the 
ACDM model can be used to rule that model out. 

As an explicit example we compute the Fisher matrix errors of 
dark energy parameters from SN-Ia luminosity-distance measure¬ 
ments. We assume experimental characteristics from the proposed 


SNAP mission as discussed in Kim et al. (2004). We consider two 
different dark energy models, one parametrized by a constant equa¬ 
tion of state parameter w, and a second by the two-parameter equa¬ 
tion of state family of equation We assume an independent 
measurement of Dm with uncertainty ±0.03 to reduce parameter 
degeneracies, and compute the marginalized confidence contours 
around different fiducial models in Q^-w and wo-Wa planes re¬ 
spectively. 

In Figure IsTI we plot the 68% and 95% ellipses around three 
models: a ACDM model with ui = — 1 and Dm = 0.3, a dark 
energy model with w — —0.88 and Dm = 0.35, and a phantom 
model with w = —1.16 and Dm = 0.26. The alignment of the 
strongest degeneracy line differs amongst the models. This is be¬ 
cause the degeneracy in the w — Dm plane is not a straight line, 
but rather a curve (see for instance Weller & Albrecht 2001). No¬ 
tice also that the ellipses around the fiducial models have different 
sizes. As Fig. lBlI shows. if the true model lies on say the 95% con¬ 
fidence limit of the ACDM data, one cannot necessarily presume 
that the ACDM model would lie on the 95% confidence limit of 
data simulated for the true model. It is possible to compute a con¬ 
tour indicating the locus of the fiducial models for which ACDM 
lies at their 95% confidence limit, and indeed such a locus is shown 
in Figure 1 of Kratochvil et al. (2004), but constructing it is a rather 
cumbersome procedure. 

This drawback turns out to be less severe for dark energy mod¬ 
els parametrized by equation 0- In Figure IB^ we plot the 68% 
and 95% ellipses in the wo-Wa plane with Dm = 0.3 around a 
ACDM model, a phantom model with wo = —0.8 and Wa = —1 
lying along the degeneracy line of the ACDM, and a constant phan¬ 
tom model with wo = —1.4 and Wa = —0.2. The dependence on 
the fiducial model is still present, since the ellipses become larger 
as the fiducial model shifts orthogonal to the principal degeneracy 
direction towards more negative equation of state values. Fiducial 
models along the same degeneracy line whose 95% contours in¬ 
clude the ACDM model are within the 95% ellipse of the ACDM 
as well. 

This paper has been typeset from a Tj^ DTgX file prepared by the 
author. 
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